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ABSTRACT 


To succeed in STEM, students need to learn to use visual repre- 
sentations. Most prior research has focused on conceptual 
knowledge about visual representations that is acquired via ver- 
bally mediated forms of learning. However, students also need 
perceptual fluency: the ability to rapidly and effortlessly translate 
among representations. Perceptual fluency is acquired via non- 
verbal, implicit learning processes. A challenge for instructional 
interventions that focus on implicit learning is to model students’ 
knowledge acquisition. Because implicit learning is non-verbal, 
we cannot rely on traditional methods, such as expert interviews 
or student think-alouds. This paper uses similarity learning, a 
machine learning method that can assess how people perceive 
similarity between visual representations. We used this approach 
to model how undergraduate students perceive similarity between 
visual representations of chemical molecules. The approach 
achieved good accuracy in predicting students’ similarity judg- 
ments and expands expert predictions of how students might 
perceive visual representations of molecules. 
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1. INTRODUCTION 
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Figure 1. Visual representations of chemical molecules: a: 
Lewis structure; b: ball-and-stick model; c: space-filling mod- 
el; d: electrostatic potential map (EPM) of water. 


Visual representations are ubiquitous instructional tools in sci- 
ence, technology, engineering, and math (STEM) domains [1, 2]. 
For example, instructors use the visual representations shown in 
Figure | to help students learn about chemical bonding. Yet, to a 
novice student, these visual representations may not be helpful 
because the student may not know how to interpret the representa- 
tions. For instance, does the red color in the ball-and-stick figure 
(Figure 1-b) mean the same thing as in the electrostatic potential 
map (EPM; Figure 1-d)? (It does not.) 


Instructors often ask students to use visual representations that 
they have never seen before to make sense of concepts that they 
have not yet learned about [3, 4], an issue known as the represen- 
tation dilemma [5]. Hence, to succeed in STEM, students need 
representational competencies that enable them to use visual 
representations to make sense of and solve domain-relevant prob- 
lems [6, 7]. One crucial representational competency is the ability 
to interpret visual representations; that is, to map visual represen- 
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tations to the abstract concepts they depict [6, 8]. For example, 
students need to understand how the representations in Figure 1 
show information about the molecule. For the Lewis structure 
(Figure l-a), the student may map the unbonded electrons shown 
as dots to conceptual knowledge about how polarity in chemical 
molecules and infer that the water molecule has a local negative 
charge by the Oxygen atom. 


Educational technologies are particularly suitable to support rep- 
resentational competencies because they can provide adaptive 
support while students solve domain-relevant problems [9, 10]. 
Such adaptive support relies on a cognitive model that infers 
whether the student has learned target skills based on her/his 
interactions with the technology. Research shows that adapting 
instruction to students’ representational competencies can enhance 
those competencies [11] and learning of domain knowledge [12]. 


However, educational technologies for representational compe- 
tencies have two critical limitations. First, they typically focus on 
one set of representational competencies: students’ conceptual 
understanding of representations (e.g., the ability to explain how 
visual features depict concepts). This focus mimics education 
psychology research’s focus on conceptual learning [6, 13]. Con- 
ceptual knowledge is invariably intertwined with a second type of 
representational competency: perceptual knowledge [14, 15]; the 
ability to rapidly and effortlessly recognize conceptual infor- 
mation based on visual features of the representations. This ability 
results from implicit forms of learning. For example, expert chem- 
ists simply “see” that the molecules depicted in Figure 1 have a 
local negative charge by the Oxygen atom, without having to 
make a an effortful conceptual inference. 


Second, of the few educational technologies that enhance percep- 
tual fluency, their adaptive capabilities are limited and their per- 
ceptual supports rely solely on performance measures (e.g., accu- 
racy, response times) to adapt to students’ representational com- 
petencies [15, 16]. They do not use a cognitive model of the latent 
skills that students acquire through perceptual learning. As a re- 
sult, they cannot provide specific feedback when students make 
mistakes. Decades of research showing that cognitive models can 
dramatically increase the effectiveness of educational technolo- 
gies [10, 17] suggest that we must address this limitation and 
create adaptive instruction for perceptual knowledge. 


These limitations likely result from cognitive modeling’s tradi- 
tional focus on explicit, verbally accessible knowledge. To devel- 
op cognitive models, researchers analyze how students think 
about target skills [9, 18]. We typically ask students to verbalize 
their problem-solving steps [19, 20]. Yet, verbalization is not 
suitable for assessing perceptual learning processes, which are 
implicit and not verbally accessible [14, 21]. Therefore, instruc- 
tional designers have to rely on “educated guesses” as to which 
visual features students may pay attention to. These educated 
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guesses are based on the novice-expert literature, which docu- 
ments the fact that novices tend to rely on surface features; that is, 
easily perceivable visual cues such as color and shape, to judge 
the similarity between stimuli items. By contrast, experts rely on 
visual features that are conceptually relevant and hence make 
more refined distinctions between visual features. Thus, to create 
adaptive perceptual supports, we need to develop cognitive mod- 
els for perceptual learning. 


Our research takes a first step towards developing a cognitive 
model for perceptual learning by assessing students’ perceptual 
knowledge of a common visual representation in chemistry. In 
particular, we investigate research question 1: Which visual fea- 
tures do students focus on when presented with visual representa- 
tions? To address this question, we asked hundreds of students to 
judge the similarity between visual representations of molecules. 
We then used similarity learning—a machine learning method that 
provides a formal approach to investigating how people perceive 
similarity among visual stimuli. This method allowed us to esti- 
mate latent factors that account for the perceived similarity rela- 
tionships between representations. Because we can map these 
latent factors to the visual features in the representations, this 
approach allows us to investigate which visual features are most 
salient to students’ perceptions of similarity. Comparing these 
visual features to “educated guesses” allowed us to test research 
question 2: Do the visual features we identified as salient via 
metric learning correspond to visual features that students are 
expected to attend to based on the expert-novice literature on 
perceptual learning? In addition, we investigated a methodological 
research question 3: How many similarity judgments we need to 
assess students’ perceptual knowledge? 


Although we address these questions in the context of a particular 
domain with a particular visual representation, this paper makes 
two important broader contributions. First, it provides an empiri- 
cal validation of the “educated guesses” that developers of percep- 
tual learning technologies typically rely on. Second, it establishes 
a methodology to assess perceptual knowledge that can serve as a 
basis for a cognitive model of perceptual learning. These contribu- 
tions build the foundation for the development of adaptive instruc- 
tion for perceptual knowledge and other implicit knowledge. 


2. EXPERIMENT 


2.1 Visual Representations of Molecules 

For our experiment, we selected visual representations of chemi- 
cal molecules common in undergraduate instruction. Lewis struc- 
ture representations are the most commonly used visual represen- 
tations in undergraduate chemistry textbooks. We reviewed text- 
books and online instructional materials and listed the frequency 
of all occurring molecules using their chemical names (e.g., HO) 
and common names (e.g., water). For our experiment, we chose 
the 50 most common molecules. 


First, we created educated guess features (Figure 2, yellow) that 
correspond to expert assessments of which visual features students 
may attend to when making similarity judgments. To obtain these 
educated guesses, we reviewed the literature on chemistry exper- 
tise [22, 23] and on perceptual learning [14, 24], and conducted 
learner-centered interviews with undergraduate and PhD students 
in chemistry [25]. We identified 6 educated guess features: num- 
ber of total letters, number of distinct letters, number of total 
bonds, number of single bonds, number of unbonded electrons, 
and molecule geometry (linear, planar, tetrahedral). 


To investigate which visual features drive students’ similarity 
judgments, we quantitatively described the visual features of the 
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Figure 2. Example features for H,O and CO, molecule repre- 
sentations with educated guess features in yellow, feature 
vectors in red, and molecule vectors in blue. 
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Figure 3. Example of a similarity judgment task: given the 
molecule on the top, students were asked which of the two 
molecules at the bottom is most similar. 


molecule representations. To this end, we created feature vectors 
for each of the molecules (see Figure 2, red) that describe which 
visual features the representation contains (e.g., bond angles, the 
numbers of specific atoms, or the numbers of different atoms 
present). The feature vectors of our corpus of molecule represen- 
tations contained a total of 110 features. The 50 feature vectors 
collectively form matrix X = [x1,X2,X3,...,Xs09], where x;, is the 
feature vector for the ith molecule. 


We aggregated each element of the feature vectors into molecule 
vector for individual features (Figure 2, blue). Each molecule 
vector consisted of 50 values describing how many times the 
feature occurred in each representation. As molecule vectors make 
up the rows of our matrix of 110 features by 50 molecules shown 
in Figure 2, we will refer to the molecule vector for the jth feature 
as r;. Thus, feature vectors provide a numeric description of the 
visual information present in each representation, whereas mole- 
cule vectors provide a numeric description of overall patterns of 
visual features in the dataset for all representations. 


2.2 Similarity Judgment Tasks 

Students completed similarity judgment tasks that were presented 
as triplet comparisons (see Figure 3). Given a representation of a 
molecule (the “target-molecule”’), students were asked to choose 
molecules”) was most similar to the given one. For each task, the 
student chose between one of the two choice-molecules that 
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he/she perceived to be more similar to the target-molecule. After 
each task, another triplet was generated uniformly at random from 
our corpus of molecule representations. 


We delivered the similarity judgment tasks via NEXT; a cloud- 
based machine learning platform [26]. NEXT allows users to 
upload their own content and query participants to perform judg- 
ment tasks. It uses machine learning algorithms to automate data 
collection and analyze results. More information about the plat- 
form can be found at http://nextml.org. In NEXT, students first 
received a brief description of the study and then worked through 
a sequence of 50 similarity judgment tasks. Students were in- 
structed that these tasks are not a test and that there is right or 
wrong answer, but that we they are simply asked about their per- 
sonal perceptions of similarities among molecule representations. 


2.3 Dataset 


Undergraduate students enrolled in an introductory chemistry 
course at a large U.S. university were invited to participate in a 
survey on learning with visual representations. The course had an 
enrolment of 781 students. Participation was voluntary. Altogeth- 
er, we collected 26,180 responses from 563 (possibly non-unique) 
students. 61.6% of the students completed all 50 similarity judg- 
ment tasks. On average, students completed 46.5 tasks. Each 
similarity judgment in response to a triplet comparison task was 
associated with the feature vectors (x;) and molecule vectors (r;) of 
the three molecule representations, as described in 2.1. 


3. ANALYSIS 


In the following, we describe how we used similarity learning to 
investigate which visual features drive students’ similarity judg- 
ments. We first provide a brief introduction into the metric learn- 
ing method in general. Then, we describe how we applied this 
method to our dataset in particular. 


3.1 Introduction to Similarity Learning 

In general, the goal of similarity learning is to learn a similarity 
function f that agrees with students’ similarity judgments in the 
following sense: if item i is judged to be more similar to j than to 
k, then f(i,j) < f(i,k). The function f can be thought as quantifying 
the perceived distance or dissimilarity between pairs. Alternative- 
ly, the function could quantify the perceptual similarity (inverse 
distance) between pairs, in which case f(i,j) > f(,k). 


People are better at providing ordinal (i.e., comparative) responses 
than at providing fine-grained quantitative judgments or ratings 
[27]. For example, when asked to compare the visual representa- 
tions in Figure 3, people find it easier to judge whether the target 
molecule is more similar to the left or the right choice molecule 
than to judge their similarity on a rating scale. However, it is 
challenging to machine-learn embeddings from comparisons due 
to the sheer number of possible triplet comparisons that could be 
made; the number of distinct triplets is proportional to n°. For 
example, in our case of n=50 molecule representations, there exist 
nearly 125,000 distinct triplets. Researchers have observed that 
while triplet comparisons are easy to answer, they can become 
tedious and boring after extended sessions [28]. Since we hypoth- 
esize that perceived dissimilarities can be accurately represented 
in d-dimensional space, it is reasonable to conjecture that if the 
embedding dimension is low (i.e., d « n), then there will be a 
high degree of redundancy among the triplet comparisons. In fact, 
researchers have observed that a small subset of these triplet com- 
parisons often suffice to learn a reasonably accurate embedding, 
lending support to this conjecture [29-31]. 


3.2 Similarity Learning Approaches 

We applied two similarity learning approaches in this paper: simi- 
larity learning by ranking [32] and non-metric multi-dimensional 
scaling. In both cases, we modelled the perceptual similarity be- 
tween molecules i and j as 


_ Ta,. 
S;; = x; Ax; 


Here A is a symmetric matrix that parameterizes the model. The 
k,lth element of the matrix, denoted by Ax; represents the im- 
portance of the interaction of feature k and feature | in the model. 
Since we assume A is symmetric, Ay; =A;, and Siz = Sj;-Before 
introducing these approaches, let us define some notation. There 
are N triplet comparisons. For the nth triplet, let i, denote the 
target-molecule and let j,, and k, denote the two choice-molecules. 
Let y, denote the student’s judgment, specifically y, = +1 if the 
student decided j, was more similar to i, and y, = -1 otherwise. 
Each of the p = 50 diagrams also has m associated features (e.g., 
numbers of different atoms, bonds, etc.). Arrange the features for 
each molecule representation into an m x 1 molecular feature 
vector, and the m x | feature vectors into a m x p matrix, X. The 
ith column of X, denoted x;, contains the m features for molecule i. 
The jth row of X, denoted r;, is a molecule vector for feature j 
containing the value of feature j for all 50 representations. 


3.2.1 Approach 1: Similarity Learning by Ranking 
This approach learns matrix A in our model of perceptual similari- 
ty directly from triplet responses via linear regression. 


_ 
Sig = %j Ax; 
where x; and x; are m x | dimensional feature vectors of the m 
features of molecule representations i and j. The matrix A is m x 
m, and the metric learning problem is to estimate A that minimizes 
the number of disagreements between the ranking predictions for 
each triple (i.e., either S;; > S;,or vice-versa) and the comparative 
judgments collected from the students, as proposed by [32]. 


The first step in this analysis was to estimate A. Formally, the 
estimation of A can be written as the following optimization prob- 
lem. Let S,,be the set of all m x m symmetric matrices. Solve for 
A that minimizes: 

N 


A = arg min 


T T 2 
pa (Yn —x; Ax,;, + x;, Axz,,) 


where the superscript T denotes the vector transposition. The 
matrix A that minimizes the sum of squared errors weights the 
similarities between the diagram features so as to predict percep- 
tual similarity judgments. In general, the solution A will place 
some weight on all m features. We anticipate that the visual fea- 
tures that are not salient do not strongly affect students’ similarity 
judgments and therefore have lower weights in A. 


Taking this thinking a step further, we could consider many dif- 
ferent optimizations of the type above, where in each case we use 
different subsets of the features, in order to determine which are 
most predictive of student judgments. Indeed, some features may- 
be totally irrelevant and worsen, rather than help, the prediction of 
students’ similarity judgments. Unfortunately, searching over all 
possible subsets of features is computationally infeasible, so we 
instead consider the following optimization that approximates this 
search problem called sparse COMET [33]. 
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This optimization method uses a cost function that consists of two 
terms. The first term represents least squares data-fitting cost in 
the previous optimization. The second term is a Group LASSO 
penalty, which encourages solutions that have many columns 
equal to 0. If a column in A is all zero, then the corresponding 
feature is not used for prediction. The number of zero-valued 
columns in the solution depends on 4 > 0. Note that we recover 
the previous optimization when A = 0. Larger values of A produce 
sparser solutions that effectively use fewer features. Features 
crucial for prediction are excluded only if A is exceedingly large. 


The second step in this analysis was to tune the parameter 4 and 
then to assess the prediction accuracy of our method. To this end, 
we used 10-fold cross validation. Specifically, we randomly split 
the complete dataset into 10 equal sized subsets. We removed 2 
random subsets as hold-out data and kept the remaining data as 
training data. We then solved the optimization above with the 
training data over a range of different 1 values. For each 2, we 
scored prediction accuracy on one set of hold-out data to select 
the optimal value. Then, using our chosen A value, we solved the 
optimization again to obtain a final A using 9/10 of the data, and 
assessed the prediction accuracy on remaining 1/10 of the data. 


The final step was to rank the features based on the weights in 
matrix. Due to the Group LASSO penalty in the loss function, 
many of the columns in the resulting matrix are zero. To get the 
aggregate weight of each relevant feature, we computed the length 
(norm) of each non-zero column and ranked accordingly. 


3.2.2 Approach 2: Ordinal Embedding 

In this approach, rather than directly making predictions of simi- 
larity based on feature vectors and triplet responses, we first used 
students’ similarity judgments to learn an embedding that spatially 
represents the similarity of molecule representations as distances 
in 2-dimensional space. We then identified molecule vectors that 
account for the distribution of molecule representations in the 
embedding space. 


The first step in this analysis was to learn an embedding. We 
applied non-metric multidimensional scaling (NMDS) to the 
26,180 triplet comparison responses collected from the experi- 
ment to learn an embedding of the 50 molecule representations in 
a two-dimensional space [22]. Embedding in two dimensions 
allows visualizing the perceived similarity computed by NMDS. 
The embedding reflects the consensus among students as to which 
molecular representations were more or less similar. We created 
50 different embeddings, using multiple random initializations per 
embedding in order to account for the non-convexity of NMDS. 


The second step was to validate the embedding. To this end, we 
computed a distance matrix for each embedding. To validate the 
distance matrices, we used the following cross-validation proce- 
dure. We selected 6000 triplet comparison responses uniformly at 
random to serve as a hold-out dataset. From the remaining triplets, 
we randomly selected training sets of different size, ranging from 
1000 to 20,000 triplet comparison responses. We computed 
embeddings for each training set. We then used these embeddings 
and the associated distance matrices to predict students’ similarity 
judgments. Next, we used the distances in the embedding as a 


predictor of judgments in the hold-out set; the prediction errors 
quantify how well the embedding reflects the judgments. We 
repeated this procedure for training sets of different size. We 
performed 50-fold cross validation to calculate average prediction 
error on the learned embeddings. This procedure allowed as- 
sessing how prediction performance relates to the training set size 
(i.e., how many triplets were used to compute an embedding). 


The third step in our analysis, after validating our embedding 
procedure, was to compute an embedding and corresponding 
distance matrix from the full set of triplets. Since the distance 
between points in the embedding corresponds to their perceived 
dissimilarity, we computed a similarity matrix defined as the 
element-wise inverse of the distance matrix, scaled from 0 to 1. 


The fourth step was to identify which features, represented by the 
feature vectors, drive students’ similarity judgments. Because the 
embedding was performed in 2 dimensions, we can consider the 
problem of only choosing 2 feature vectors to combine and com- 
pare combinations of pairs of feature vectors to the similarity 
matrix. For each possible pair, we performed a least squares opti- 
mization to find the ideal uniform scaling to match an outer prod- 
uct of our feature vectors to the similarity matrix. 
& “ 2 
A= arg min S- (Si, — x? Ax;) 
ij=l 

subject to A,, = 0 for all s,t not equal to k,l or Ik. In other words, 
only let the k,l elements of A be non-zero and optimize these. This 
equates to fitting S to the molecule vectors for features k and i. 
Here, S;;represents the value of the perceptual similarity between 
molecules i and j from the embedding. The magnitude of resulting 
value of A;; tells us how important the interaction of features k 
and | is in representing the similarity. This is basically a correla- 
tion coefficient, and it only gauges the marginal value of this 
interaction (i.e., in isolation of all other interactions). In each case, 
after learning a matrix A_ we computed the corresponding residual 
value between similarity matrix S and our combination of 2 fea- 
tures. After performing all possible combinations of pairs of fea- 
tures, we ranked pairs of features in ascending order of residual 
values, with the smallest residuals being the best approximation of 
our observed similarity matrix. To evaluate the feature rankings, 
we used 10-fold cross-validation by performing identical tests on 
10 different similarity matrices computed from different 
embeddings based on equal numbers of triplets to ensure that the 
original embedding and the non-convexity of NMDS was not a 
factor in the final ranking of feature pairs. 


4. RESULTS 
4.1 Identifying Important Visual Features 


To address research question 1, we used the two similarity learn- 
ing approaches just described to identify which visual features 
account for students’ similarity judgments. 


4.1.1 Approach 1: Similarity Learning by Ranking 
Recall that the first approach entailed learning a similarity func- 
tion that describes students’ perceived similarity between mole- 
cule representations. This approach yielded an average 69% pre- 
diction accuracy of students’ similarity judgments (assessed via 
10-fold cross validation). This finding indicates that there was 
consensus over which representations were more or less similar, 
but also that there were some disagreements among students’ 
similarity judgments. 


To identify which visual features account for students’ similarity 
judgments, we estimated the weights for each feature in the ma- 
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Table 1. Top 10 features from the ranking of features with 
strong weights obtained by Approach 1. 


Feature Avg weight 
Distinct letters 4.50% 
Single bonds between Oxygen and Hydrogen 3.45% 
180-degree angle in Hydrogen-Carbon-Fluorine 3.16% 
Double bonds between Oxygen and Nitrogen 3.03% 
Number of Nitrogen atoms 2.99% 
Double bonds between Carbon and Oxygen 2.78% 
120-degree angle in Hydrogen-Carbon-Hydrogen 2.73% 
Number of Oxygen atoms 2.64% 
180-degree angle in Carbon-Carbon-Oxygen 2.62% 
Single bonds between Carbon and Oxygen 2.37% 


chine-learned matrix A. The stronger a feature’s weight in A, the 
more this feature affected students’ similarity judgments. Hence, 
the feature’s weight corresponds to its saliency in students’ per- 
ception of molecule representations. 


Table 1 shows the 10 most important features, as determined by a 
ranking of features according to their aggregate weight computed 
from matrix A. These results show that the most highly ranked 
feature is the number of distinct letters, which corresponds to an 
aggregate educated guess feature. Specific visual features that are 
relevant to organic molecules were also ranked highly (e.g., the 
number of single bonds between Oxygen and Hydrogen atoms, 
the number of bonds between Carbon and Oxygen, the number of 
Nitrogen and Oxygen atoms). These specific visual features were 
present in many of the molecules in our dataset. Several visual 
features also included geometric aspects, specifically bond angles. 
These features indicate the presence of chemical functional groups 
that are relevant to predicting molecule’s reactive behaviors. 


4.1.2 Approach 2: Ordinal Embedding 

Recall that approach s learns an embedding that represents the 
similarity of molecule representations as distances in a d- 
dimensional space, from which we then extracted the most im- 
portant features. First, we established how many dimensions we 
need to consider (i.e., which d to choose in representing similarity 
of molecule representations in a d-dimensional space). Using the 
process of 50-fold cross validation described above, we calculated 
unit through 20 dimensional embeddings of perceptual similarity. 
We used 20,000 triplets in this computation to ensure that the 
number of triplets did not affect the prediction accuracy as the 
dimension became large. Figure 4 shows that there is no drop in 
prediction accuracy when embedding in low dimensions versus 
high, suggesting that perceptual similarity can be accurately rep- 
resented in a low dimensional subspace, and that there is a high 
degree of redundancy in the data. This result shows that students’ 
responses agreed on the relative similarity about 70% of the time. 


Next, we generated a 2-dimensional embedding that describes 
students’ perceived similarity between the molecule representa- 
tions. Figure 5 shows this embedding, illustrating that molecules 
naturally form clusters based on their perceptual similarity. These 
clusters correspond to specific chemical properties shared among 
the molecules, such the presence of a particular type of bond or a 
functional group. We color-coded and labeled some of these clus- 
ters to illustrate these characteristics of students’ perceptions. This 
illustration lends face validity to our embedding approach. 


From this embedding, we extracted an ordered list of the feature 
pairs that best capture students’ similarity judgments, shown in 
Table 2. The feature pairs in this table were ranked based on how 
well they approximate the similarity matrix computed from the 
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Figure 4. Prediction accuracy on hold-out set by number of 
dimensions in embedding. 


Table 2. Top 10 feature pairs from Approach 2. Each row 
corresponds to a pair of feature vectors ranked in accordance 
with how accurately they described the observed similarity 
structure from the embedding. 


Rank | Feature pairs 

1 Distinct letters & Distinct letters 

2 Total letters & Distinct letters 

3 Distinct letters & Single bonds 

4 Total bonds & Distinct letters 

5 Distinct letters & — Carbons 

6 Hydrogens & Distinct letters 

7 Total letters & Total letters 

8 Total letters & Single bonds 

9 Total letters & _Unbonded electrons 
10 Distinct letters & Single Carbon-Hydrogen bonds 


embedding in Figure 5. The same feature may appear twice in a 
pair to account for the possibility that a weighted combination of a 
feature with itself better reflects the observed similarity structure 
than does a pair of features. In sum, these results show that the 
most highly ranked features are general visual features, which 
correspond to the aggregate educated guess features (e.g., number 
of letters, number of lines). Specific visual features that are rele- 
vant to hydrocarbon molecules were also ranked highly (e.g., the 
number of Carbon and Hydrogen atoms). These specific features 
were present in many of the molecules in our dataset. 


4.1.3, Comparing the Similarity Learning Approaches 
While both methods agreed upon the top ranked feature, the simi- 
larity learning by ranking approach ranked structural features of 
the representations that were relevant to hydrocarbons and organic 
molecules more highly. As the ranking from this method follow 
predictive power, this ranking indicates that students’ judgments 
of similarity can best be predicted, and therefore explained, 
through a combination of the number of different letters and the 
structural features involving Carbon, Hydrogen, and Oxygen. 


4.2, Comparison with “Educated Guesses” 

To address research question 2 (do the visual features we identi- 
fied as salient via metric learning correspond to visual features 
that students are expected to attend to?), we compared the results 
from the similarity learning approaches to the educated guess 
features that we had determined based on the expert-novice litera- 


Proceedings of the 9th International Conference on Educational Data Mining 203 


SP2 
Hybridization 


n-ton 
; 


Diatomic A, ap 
Covalent " 


Alkenes and Alkynes 


Hydroxyl 
Groups 


Figure 5. 2-dimensional similarity embedding from Approach 2. Distances between molecule representations correspond to stu- 
dents’ perceptions of dissimilarity (i.e., molecule representations that are depicted close to one another are perceived to be similar). 
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Figure 6. Prediction accuracy on hold-out set by number of 
triplet comparison judgments used in the training set. 


ture on perceptual learning. Overall, the results from both metric 
learning approaches agree with the educated guesses: aggregate 
features that describe general visual features were ranked to be 
most important by both metric learning approaches. The similarity 
learning by ranking approach also yielded a number of visual 
features that are specific to the types of molecules in our corpus; 
in particular, visual representations that are highly relevant for 
comparing organic molecules. 


4.3 Number of Similarity Judgments Needed 

We addressed our methodological research question 3 (how many 
similarity judgments we need to assess students’ perceptual 
knowledge) with the ordinal embedding approach. Specifically, 
we tested how many triplet comparisons are required to compute a 


representative embedding of the underlying similarity. Figure 6 
shows that gains in prediction accuracy of the embedding were no 
longer statistically significant beyond 7000 triplet comparisons. 


4.4 Differences Between the Two Approaches 
The two methods are different and potentially complementary. 
There is no definitively correct way to fit the common model 
Sij = x] Ax; to data. The main differences in the final rankings 
they produce stems from how we are learning matrix A and the 
restrictions we put on its structure. In approach | we are directly 
working with triplet responses which are perhaps noisy due to 
disagreements in students’ individual judgments of perceptual 
similarity, but we are placing fewer restrictions on the learned 
matrix, allowing for more feature interaction. In approach 2, 
NMDS is useful for capturing perceived similarity in aggregate, 
but we enforce much stronger restrictions on the structure of A, 
namely that only two features may interact at once, giving a clear- 
er picture of the importance of a pair of features. 


If we had to recommend one approach, we prefer the regression 
approach (approach 1) because it optimizes prediction error, 
which is an objective measure of model quality. The embedding 
approach (approach 2) has its own potential virtues: The low- 
dimensional embedding provides an implicit form of regulariza- 
tion that may be helpful especially if the amount of response data 
is small. Also, the embedding provides a visual representation of 
perceptual similarities which is helpful for model interpretation. 


5. DISCUSSION 


We applied similarity learning approaches to assess which visual 
features students focus on when presented with visual representa- 
tions. We compared two approaches, one that allows us to assess 
the predictive power of the identified features, and one that allows 
representing the perceived similarity in a d-dimensional space. 
Both approaches yield similar results as to which visual features 
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are salient to students. Hence, both approaches address research 
question 1: Which visual features do students focus on when 
presented with visual representations? We found that students’ 
similarity judgments of Lewis structures appear to be driven by 
general visual features such as the number of total and distinct 
letters, as well as by visual features specific to the types of mole- 
cules in our dataset (e.g., number of Hydrogen / Carbon atoms). 


Our results also address research question 2: Do the visual fea- 
tures we identified as salient via similarity learning correspond to 
visual features that students are expected to attend to based on the 
expert-novice literature on perceptual learning? We found that the 
identified general visual features align with educated guesses 
based on the literatures on expertise and perceptual learning, 
which validates the common “educated guess” approach that 
instructional designers have to rely on in the absence of assess- 
ments of perceptual knowledge. Our results also suggest that, in 
addition to these general features, students learn to pay attention 
to key visual features that are highly domain-specific; such as 
features that indicate the presence of functional groups that are 
predictive of chemical behaviors. Furthermore, our results show 
that a few key features predict students’ perceptions of similarity 
between visual representations with accuracy of about 70%. 


Finally, we addressed our methodical research question 3: How 
many similarity judgments we need to assess students’ perceptual 
knowledge? Our results show that about 7,000 responses to triplet 
comparison tasks are sufficient in assessing a population’s percep- 
tual knowledge. Using a survey with 50 triplet comparison tasks 
(as in our experiment), that means an N of 140 participants will 
yield valid assessments of perceptual knowledge. 


6. LIMITATIONS 


Although both similarity learning approaches had rigorous theo- 
retical backing, we made a few assumptions about our triplet 
comparison data that had inherent limitations of note. In both of 
these methods, we are not modelling individual students, but 
rather the population as a whole. Consequently, we assume that 
the triplets and therefore the judgments of similarity are inde- 
pendent of one another. This assumption allows us to learn the 
rankings of features and feature pairs for the students’ collective- 
ly, but it does not provide a ranking for an individual. Further, 
because judging similarity representations is a subjective task, 
students’ judgments may in certain cases conflict with one anoth- 
er. Even with an extremely large number of similarity judgments, 
complete consensus is unlikely, and therefore, perfect prediction 
of student judgments is similarly difficult to achieve. Hence, 
future research needs to investigate how to expand the present 
approach to modeling individual perceptual knowledge. 


Another limitation pertains to the ordinal embedding procedure. 
For visualization purposes, we embedded the molecules into a 2- 
dimensional space. Higher dimensional embedding may more 
accurately capture perceptual dissimilarities. Future research 
should explore this question. 


7. FUTURE DIRECTIONS 


We will expand our research to other types of visual representa- 
tions typically used in chemistry instruction (see Figure 1). Fur- 
ther, we will gather data from expert chemists and compare them 
to data from novices and advanced learners. Based on this com- 
parison, we will identify a “perceptual knowledge gap” between 
students and experts. Specifically, we will identify visual features 
that experts attend to but students do not. 


Further, we will expand similarity learning so that it can assess an 
individual student’s perceptual knowledge in real time. The cur- 


rent approach is limited in that it requires a large number of simi- 
larity judgments to assess students’ perceptual knowledge, which 
is only feasible if we are interested in assessing perceptual 
knowledge of a population of interest (e.g., novices, advanced 
students, experts), and because we assume independence among 
similarity judgments. To address this limitation, we will combine 
our similarity learning approach with cognitive modeling methods 
(e.g., Bayesian knowledge tracing). For example, a similarity 
judgment survey may provide a prior for in a cognitive model, and 
students’ performance on perceptual learning tasks may inform 
the choice of representations for a small number similarity judg- 
ment tasks interspersed in the learning activity. 


This expansion will provide the basis for the design of adaptive 
instruction for perceptual knowledge that can provide appropriate 
sequences of perceptual learning tasks that draw students’ atten- 
tion to visual features they yet have to learn. Further, knowing 
which visual features students have not yet learned can serve as a 
basis for the design of visual feedback that highlights visual fea- 
tures when students make mistakes on perceptual learning tasks. 


In sum, we will use the similarity learning approach described in 
this paper both to design instruction for perceptual learning and to 
assess perceptual knowledge as a learning outcome. 


8. CONCLUSIONS 


This paper described a new approach to assess students’ perceptu- 
al knowledge. We used this approach to validate the “educated 
guesses” approach. In addition, we offer more formal pathways 
for instructional designers to create perceptual learning assess- 
ments. Because developing adaptive instruction for perceptual 
knowledge relies on such assessments, this paper makes an im- 
portant contribution to cognitive modeling research. 


This paper also makes important contributions to machine learn- 
ing. We provide a new mathematical approach to quantify the 
accuracy of perceptual embeddings learned from similarity judg- 
ments. Specifically, we derived bounds on the accuracy of 
embeddings learned from small numbers of comparative judg- 
ments by adapting recently developed large-sample analysis 
methods [34]. This approach provided new algorithms for gener- 
ating embeddings that are provably accurate. We investigated new 
methods for embedding based on spectral methods inspired by 
spectral ranking algorithms [35]. Our experiment yielded an em- 
pirical validation with perceptual data from undergraduates, as 
well as new machine learning methods to assess how visual fea- 
tures predict or encode perceptual similarity judgments. Specifi- 
cally, we explored the application of group Lasso algorithms for 
automatically selecting the most perceptually salient features [36]. 
Our experiment empirically evaluated the group Lasso approach. 


In sum, our work provides a crucial stepping stone towards adap- 
tive instruction for perceptual knowledge. Perceptual knowledge 
is by definition implicit and does not lend itself to the kinds of 
techniques used in traditional cognitive modeling approaches 
(e.g., think-alouds, interviews). We presented and evaluated two 
similarity learning approaches that can determine which visual 
features students attend to when perceiving visual representations. 
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